Finish Accumulators: a Deterministic Reduction Construct for Dynamic Task Parallelism

نویسندگان

Jun Shirako

Vincent Cavé

Jisheng Zhao

Vivek Sarkar

چکیده

Parallel reductions represent a common pattern for computing the aggregation of an associative and commutative operation, such as summation, across multiple pieces of data supplied by parallel tasks. In this paper, we introduce finish accumulators, a unified construct that supports predefined and user-defined deterministic reductions for dynamic async-finish task parallelism. Finish accumulators are designed to be integrated into terminally strict models of task parallelism as in the X10 and Habanero-Java (HJ) languages, which is more general than fully strict models of task parallelism found in Cilk and OpenMP. In contrast to lower-level reduction constructs such as atomic variables, the high-level semantics of finish accumulators allows for a wide range of implementations with different accumulation policies, e.g., eager-computation vs. lazycomputation. The best implementation can thus be selected based on a given application and the target platform that it will execute on. We have integrated finish accumulators into the Habanero-Java task parallel language, and used them in both research and teaching. In addition to their higherlevel semantics, experimental results demonstrate that our Java-based implementation of finish accumulators delivers comparable or better performance for reductions relative to Java’s atomic variables and concurrent collection libraries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

COMP 322: Fundamentals of Parallel Programming Module 1: Deterministic Shared-Memory Parallelism

1 Task-level Parallelism 6 1.1 Task Creation and Termination (Async, Finish) . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Computation Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Ideal Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Multiprocessor Scheduling . . . . . . . . . ...

متن کامل

Compiler Support for Work-Stealing Parallel Runtime Systems

Multiple programming models are emerging to address an increased need for dynamic task parallelism in multicore shared-memory multiprocessors. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Threading Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work-stealing, as embodied in Cilk’s implementation of dynamic spaw...

متن کامل

Work-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs

Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-addressspace parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in...

متن کامل

Phaser Beams: Integrating Stream Parallelism with Task Parallelism

Current streaming languages place significant restrictions on the structure of parallelism that they support, and usually do not allow for dynamic task parallelism. In contrast, there are a number of task-parallel programming models that support dynamic parallelism but lack the ability to set up efficient streaming communications among dynamically varying sets of tasks. We address this gap by i...

متن کامل

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory bandwidth supported by the device makes ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Finish Accumulators: a Deterministic Reduction Construct for Dynamic Task Parallelism

نویسندگان

چکیده

منابع مشابه

COMP 322: Fundamentals of Parallel Programming Module 1: Deterministic Shared-Memory Parallelism

Compiler Support for Work-Stealing Parallel Runtime Systems

Work-First and Help-First Scheduling Policies for Terminally Strict Parallel Programs

Phaser Beams: Integrating Stream Parallelism with Task Parallelism

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

عنوان ژورنال:

اشتراک گذاری